GE Aviation - Remaining Useful Life Analysis
Part 1 - Data Preparation
Getting Started
Connecting to the database
dbListTables(mysqlconnection)[1] "engine_data_aic" "engine_data_axm"
[3] "engine_data_fron" "engine_data_pgt"
[5] "esn_rul" "lkp_airport_codes_t"
[7] "manufacturing_sql_by_esn"
Reading the table
esn_rul = dbReadTable(mysqlconnection, "esn_rul")
engine_data_aic = dbReadTable(mysqlconnection, "engine_data_aic")
engine_data_axm = dbReadTable(mysqlconnection, "engine_data_axm")
engine_data_fron = dbReadTable(mysqlconnection, "engine_data_fron")
engine_data_pgt = dbReadTable(mysqlconnection, "engine_data_pgt")
lkp_airport_codes_t = dbReadTable(mysqlconnection, "lkp_airport_codes_t")
manufacturing_sql_by_esn = dbReadTable(mysqlconnection, "manufacturing_sql_by_esn")Top observations for each dataset
| dataset | esn | unit | flight_cycle | datetime | operator | depart_icao | destination_icao | hpc_eff_mod | hpc_flow_mod | tra | t2 | t24 | t30 | t50 | p2 | p15 | p30 | nf | nc | epr | ps30 | phi | nrf | nrc | bpr | farb | htbleed | nf_dmd | pcnfr_dmd | w31 | w32 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| test_FD001 | 999120 | 20 | 1 | 2018-02-01T13:47:42.000Z | AIC | LFBO | LFBO | 0.0011 | -5e-04 | 100 | 518.67 | 182.59 | 1588.38 | 1397.780 | 14.62 | 21.61 | 554.43 | 2388.07 | 9072.24 | 1.3 | 47.40 | 521.88 | 2388.06 | 8147.58 | 8.3923 | 0.03 | 393 | 2388 | 100 | 38.76 | 23.4970 |
| test_FD001 | 999120 | 20 | 2 | 2018-02-01T18:22:24.000Z | AIC | LFBO | VIDP | -0.0001 | -3e-04 | 100 | 518.67 | 182.56 | 1590.73 | 1405.880 | 14.62 | 21.61 | 553.62 | 2388.06 | 9059.50 | 1.3 | 47.32 | 522.03 | 2388.06 | 8151.74 | 8.4385 | 0.03 | 392 | 2388 | 100 | 38.90 | 23.4240 |
| test_FD001 | 999120 | 20 | 3 | 2018-02-08T00:46:00.000Z | AIC | VIDP | VABB | -0.0015 | 3e-04 | 100 | 518.67 | 182.54 | 1590.70 | 1401.310 | 14.62 | 21.60 | 554.24 | 2388.08 | 9064.83 | 1.3 | 47.19 | 521.80 | 2388.03 | 8146.37 | 8.4234 | 0.03 | 392 | 2388 | 100 | 38.96 | 23.4460 |
| test_FD001 | 999120 | 20 | 4 | 2018-02-08T03:58:00.000Z | AIC | VABB | VIDP | -0.0022 | 1e-04 | 100 | 518.67 | 183.01 | 1588.91 | 1403.280 | 14.62 | 21.61 | 553.79 | 2388.06 | 9058.04 | 1.3 | 47.28 | 522.25 | 2388.06 | 8144.65 | 8.3955 | 0.03 | 391 | 2388 | 100 | 38.94 | 23.3412 |
| test_FD001 | 999120 | 20 | 5 | 2018-02-08T07:14:00.000Z | AIC | VIDP | VABB | -0.0064 | -1e-04 | 100 | 518.67 | 182.73 | 1588.43 | 1406.460 | 14.62 | 21.61 | 553.90 | 2388.06 | 9064.03 | 1.3 | 47.42 | 522.17 | 2388.04 | 8146.81 | 8.4371 | 0.03 | 393 | 2388 | 100 | 38.94 | 23.4452 |
| test_FD001 | 999120 | 20 | 6 | 2018-02-08T11:23:00.000Z | AIC | VABB | VOCI | 0.0002 | 5e-04 | 100 | 518.67 | 182.64 | 1583.41 | 1410.055 | 14.62 | 21.61 | 554.59 | 2388.06 | 9065.79 | 1.3 | 47.21 | 521.51 | 2388.00 | 8144.26 | 8.4084 | 0.03 | 394 | 2388 | 100 | 39.11 | 23.2680 |
| dataset | esn | unit | flight_cycle | datetime | operator | depart_icao | destination_icao | hpc_eff_mod | hpc_flow_mod | tra | t2 | t24 | t30 | t50 | p2 | p15 | p30 | nf | nc | epr | ps30 | phi | nrf | nrc | bpr | farb | htbleed | nf_dmd | pcnfr_dmd | w31 | w32 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| train_FD001 | 999062 | 62 | 1 | 2018-03-01T08:25:50.000Z | AXM | WMKK | VHHH | 0.0026 | 1e-04 | 100 | 518.67 | 642.78 | 1595.36 | 1410.055 | 14.62 | 21.61 | 553.33 | 2388.19 | 9060.39 | 1.3 | 47.60 | 521.77 | 2388.15 | 8139.18 | 8.4882 | 0.03 | 393 | 2388 | 100 | 38.77 | 23.2630 |
| train_FD001 | 999062 | 62 | 2 | 2018-03-01T13:03:43.000Z | AXM | VHHH | WMKK | -0.0010 | 5e-04 | 100 | 518.67 | 642.50 | 1591.94 | 1411.070 | 14.62 | 21.61 | 552.99 | 2388.12 | 9060.53 | 1.3 | 47.65 | 521.75 | 2388.13 | 8136.55 | 8.4135 | 0.03 | 393 | 2388 | 100 | 38.67 | 23.3297 |
| train_FD001 | 999062 | 62 | 3 | 2018-03-01T22:26:54.000Z | AXM | WMKK | VYYY | 0.0031 | 0e+00 | 100 | 518.67 | 643.11 | 1598.06 | 1410.285 | 14.62 | 21.61 | 552.63 | 2388.10 | 9060.52 | 1.3 | 47.59 | 521.22 | 2388.06 | 8134.15 | 8.4183 | 0.03 | 393 | 2388 | 100 | 38.76 | 23.1726 |
| train_FD001 | 999062 | 62 | 4 | 2018-03-01T23:33:03.000Z | AXM | VYYY | VYYY | -0.0023 | -2e-04 | 100 | 518.67 | 642.68 | 1591.31 | 1410.655 | 14.62 | 21.61 | 553.47 | 2388.15 | 9057.44 | 1.3 | 47.50 | 521.20 | 2388.11 | 8137.75 | 8.4488 | 0.03 | 392 | 2388 | 100 | 38.99 | 23.3329 |
| train_FD001 | 999062 | 62 | 5 | 2018-03-02T02:35:17.000Z | AXM | VYYY | WMKK | -0.0033 | 4e-04 | 100 | 518.67 | 642.24 | 1590.59 | 1410.515 | 14.62 | 21.61 | 553.95 | 2388.14 | 9050.12 | 1.3 | 47.62 | 521.38 | 2388.05 | 8140.60 | 8.4512 | 0.03 | 393 | 2388 | 100 | 38.84 | 23.2909 |
| train_FD001 | 999062 | 62 | 6 | 2018-03-02T09:08:00.000Z | AXM | ZPPP | WMKK | -0.0005 | -4e-04 | 100 | 518.67 | 642.25 | 1590.12 | 1413.290 | 14.62 | 21.61 | 553.16 | 2388.15 | 9056.04 | 1.3 | 47.49 | 521.41 | 2388.16 | 8134.03 | 8.4759 | 0.03 | 393 | 2388 | 100 | 38.94 | 23.2215 |
| dataset | esn | unit | flight_cycle | datetime | operator | depart_icao | destination_icao | hpc_eff_mod | hpc_flow_mod | tra | t2 | t24 | t30 | t50 | p2 | p15 | p30 | nf | nc | epr | ps30 | phi | nrf | nrc | bpr | farb | htbleed | nf_dmd | pcnfr_dmd | w31 | w32 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| train_FD001 | 999050 | 50 | 1 | 2018-01-06T12:01:09.000Z | FRON | KMCO | KMSY | -0.0029 | -2e-04 | 100 | 518.67 | 642.66 | 1591.79 | 1401.30 | 14.62 | 21.60 | 554.60 | 2388.01 | 9064.12 | 1.3 | 47.47 | 521.68 | 2388.06 | 8151.49 | 8.4158 | 0.03 | 393 | 2388 | 100 | 38.80 | 23.3016 |
| train_FD001 | 999050 | 50 | 2 | 2018-01-06T13:41:00.000Z | FRON | KMSY | KSAT | -0.0002 | -5e-04 | 100 | 518.67 | 642.28 | 1587.84 | 1404.96 | 14.62 | 21.61 | 553.60 | 2388.06 | 9065.83 | 1.3 | 47.33 | 522.12 | 2388.07 | 8142.72 | 8.4467 | 0.03 | 392 | 2388 | 100 | 38.99 | 23.3440 |
| train_FD001 | 999050 | 50 | 3 | 2018-01-06T14:41:18.000Z | FRON | KMSY | KSAT | -0.0010 | -5e-04 | 100 | 518.67 | 642.21 | 1586.89 | 1404.47 | 14.62 | 21.61 | 554.31 | 2388.05 | 9065.63 | 1.3 | 47.48 | 521.96 | 2388.05 | 8139.14 | 8.4424 | 0.03 | 393 | 2388 | 100 | 38.91 | 23.3190 |
| train_FD001 | 999050 | 50 | 4 | 2018-01-06T16:14:00.000Z | FRON | KSAT | KSAN | -0.0061 | -2e-04 | 100 | 518.67 | 643.19 | 1587.36 | 1398.90 | 14.62 | 21.61 | 554.35 | 2388.07 | 9059.91 | 1.3 | 47.30 | 522.31 | 2388.04 | 8145.16 | 8.4504 | 0.03 | 393 | 2388 | 100 | 38.95 | 23.3161 |
| train_FD001 | 999050 | 50 | 5 | 2018-01-06T17:12:52.000Z | FRON | KSAT | KSAN | -0.0002 | 1e-04 | 100 | 518.67 | 642.47 | 1584.96 | 1406.08 | 14.62 | 21.61 | 554.03 | 2388.00 | 9073.29 | 1.3 | 47.44 | 522.05 | 2388.05 | 8145.35 | 8.3822 | 0.03 | 392 | 2388 | 100 | 38.83 | 23.3256 |
| train_FD001 | 999050 | 50 | 6 | 2018-01-06T20:21:00.000Z | FRON | KSAN | KSAT | -0.0003 | -3e-04 | 100 | 518.67 | 641.82 | 1585.30 | 1399.30 | 14.62 | 21.61 | 554.38 | 2388.02 | 9068.48 | 1.3 | 47.13 | 522.17 | 2388.04 | 8144.13 | 8.4180 | 0.03 | 393 | 2388 | 100 | 38.80 | 23.4777 |
| dataset | esn | unit | flight_cycle | datetime | operator | depart_icao | destination_icao | hpc_eff_mod | hpc_flow_mod | tra | t2 | t24 | t30 | t50 | p2 | p15 | p30 | nf | nc | epr | ps30 | phi | nrf | nrc | bpr | farb | htbleed | nf_dmd | pcnfr_dmd | w31 | w32 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| train_FD001 | 999056 | 56 | 1 | 2018-01-01T12:33:13.000Z | PGT | LTBJ | LTCR | 0.0012 | -4e-04 | 100 | 518.67 | 642.75 | 1586.44 | 1412.720 | 14.62 | 21.61 | 552.68 | 2388.10 | 9059.10 | 1.3 | 47.72 | 521.18 | 2388.13 | 8136.92 | 8.4412 | 0.03 | 395 | 2388 | 100 | 38.81 | 23.2391 |
| train_FD001 | 999056 | 56 | 2 | 2018-01-01T15:40:21.000Z | PGT | LTCR | LTBJ | 0.0012 | -4e-04 | 100 | 518.67 | 642.47 | 1584.96 | 1410.405 | 14.62 | 21.61 | 552.90 | 2388.12 | 9057.99 | 1.3 | 47.42 | 520.82 | 2388.08 | 8133.11 | 8.4461 | 0.03 | 394 | 2388 | 100 | 38.82 | 23.3340 |
| train_FD001 | 999056 | 56 | 3 | 2018-01-01T18:23:01.000Z | PGT | LTBJ | LTFJ | 0.0026 | 5e-04 | 100 | 518.67 | 642.52 | 1587.64 | 1403.700 | 14.62 | 21.61 | 553.52 | 2388.13 | 9054.91 | 1.3 | 47.48 | 521.70 | 2388.12 | 8136.86 | 8.4357 | 0.03 | 394 | 2388 | 100 | 38.89 | 23.2844 |
| train_FD001 | 999056 | 56 | 4 | 2018-01-01T20:11:10.000Z | PGT | LTFJ | LTCG | 0.0034 | -2e-04 | 100 | 518.67 | 642.51 | 1587.80 | 1410.585 | 14.62 | 21.61 | 553.60 | 2388.13 | 9045.30 | 1.3 | 47.49 | 522.06 | 2388.18 | 8132.53 | 8.4411 | 0.03 | 394 | 2388 | 100 | 38.79 | 23.3204 |
| train_FD001 | 999056 | 56 | 5 | 2018-01-02T03:10:50.000Z | PGT | LTCG | LTFJ | 0.0024 | -1e-04 | 100 | 518.67 | 643.08 | 1593.15 | 1401.460 | 14.62 | 21.61 | 553.45 | 2388.03 | 9046.37 | 1.3 | 47.67 | 521.36 | 2388.16 | 8133.47 | 8.4824 | 0.03 | 394 | 2388 | 100 | 39.00 | 23.3592 |
| train_FD001 | 999056 | 56 | 6 | 2018-01-02T06:09:11.000Z | PGT | LTFJ | LTCN | 0.0010 | -2e-04 | 100 | 518.67 | 642.52 | 1589.19 | 1408.880 | 14.62 | 21.61 | 553.22 | 2388.12 | 9054.19 | 1.3 | 47.77 | 520.98 | 2388.13 | 8132.14 | 8.4382 | 0.03 | 393 | 2388 | 100 | 38.83 | 23.2568 |
| esn | rul |
|---|---|
| 999182 | 9 |
| 999115 | 93 |
| 999184 | 63 |
| 999113 | 104 |
| 999175 | 123 |
| 999197 | 95 |
| airport_icao | latitude | longitude |
|---|---|---|
| EDDN | 49.499 | 11.078 |
| KSAT | 29.534 | -98.469 |
| KSLC | 40.788 | -111.978 |
| KTYS | 35.811 | -83.994 |
| LTAR | 39.814 | 36.903 |
| LWSK | 41.962 | 21.621 |
| esn | X44321P02_op016_median_first | X44321P02_op420_median_first | X54321P01_op116_median_first | X54321P01_op220_median_first | X65421P11_op232_median_first | X65421P11_op630_median_first |
|---|---|---|---|---|---|---|
| 999016 | 26.85684 | 11.77815 | 27.10170 | 22.76247 | 117.6866 | 152.3637 |
| 999049 | 19.58206 | 10.45221 | 33.77607 | 21.28657 | 193.9467 | 235.9931 |
| 999135 | 24.94090 | 10.08180 | 21.94967 | 28.45795 | 140.1707 | 190.1172 |
| 999140 | 21.43138 | 14.11467 | 33.67310 | 29.35572 | 203.6650 | 150.0929 |
| 999063 | 25.12926 | 15.95785 | 27.14931 | 27.86627 | 143.6061 | 208.8172 |
| 999089 | 19.19329 | 13.40339 | 26.16745 | 29.49466 | 217.1039 | 241.2999 |
Joining the data sets
First, the 4 engine datasets from the 4 operators were appended to create the engine_health dataset.
engine_health = rbind(engine_data_aic, engine_data_axm, engine_data_fron, engine_data_pgt)Next, engine_health was merged with the remaining datasets to create a collective data frame df, specifically:
manufacturing_sql_by_esncontains part numbers and operations of each enginelkp_airport_codes_tcontains the coordinates of each airport used to calculate the flight distance for each flight.- After merging, the coordinates columns for
depart_icaowere renamed asdepart_latitudeanddepart_longitude, similarly fordestination_icao.
- After merging, the coordinates columns for
esn_rulcontains key-value pairs of esn and RUL.
df = left_join(engine_health, manufacturing_sql_by_esn, by = 'esn')
df = left_join(df, lkp_airport_codes_t, by=c('depart_icao'='airport_icao'))
colnames(df)[which(names(df) == "latitude")]= 'depart_latitude'
colnames(df)[which(names(df) == "longitude")]= 'depart_longitude'
df = left_join(df, lkp_airport_codes_t, by=c('destination_icao'='airport_icao'))
colnames(df)[which(names(df) == "latitude")]= 'destination_latitude'
colnames(df)[which(names(df) == "longitude")]= 'destination_longitude'
df = left_join(df, esn_rul, by = 'esn')Calculate Distance
The flight distance was estimated (in kilometers) from the provided coordinates of departure and destination locations using the distVincentyEllipsoid function from the geosphere package, which calculates the shortest distance between 2 points (the great-circle-distance) according to the Vincenty (ellipsoid) method. Please note that this estimate does not necessarily reflect the real distance of a particular flight due to the lack of specific information on the flight path.
The columns on location information were dropped afterward as they were no longer informative, also to reduce the number of variables to consider.
df$distance = mapply(function(long1, lat1, long2, lat2) distVincentyEllipsoid(c(long1, lat1), c(long2, lat2))/1000, df$depart_longitude, df$depart_latitude,df$destination_longitude, df$destination_latitude)
# Drop location columns
df %<>% select(-c('depart_icao','destination_icao',
'depart_latitude','depart_longitude',
'destination_latitude','destination_longitude'))Check for tidy, technically correct, and consistent data
- Fill in NA values for blank cells that were not null.
From my first validation run, I noticed that there were missing values in
distance, as well as the latitude and longitude columns. I then investigated these observations more closely and figured there were blank cells in the ICAO columns (hence the NA values in the coordinates columns and calculated distance). These cells appeared to be non-null because they might contain empty strings or blank space.Therefore, this step was to make sure any columns with missing values would actually be indicated as such in the validation report.
- Change variable types where necessary:
datetimeas POSIXctdataset,unit,operatoras factortra,htbleed,nf_dmd, andpcnfr_dmdas numeric
glimpse(df) ## quick look at the data typesRows: 30,004
Columns: 38
$ dataset <chr> "test_FD001", "test_FD001", "test_FD001",…
$ esn <int> 999120, 999120, 999120, 999120, 999120, 9…
$ unit <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 2…
$ flight_cycle <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
$ datetime <chr> "2018-02-01T13:47:42.000Z", "2018-02-01T1…
$ operator <chr> "AIC", "AIC", "AIC", "AIC", "AIC", "AIC",…
$ hpc_eff_mod <dbl> 0.0011, -0.0001, -0.0015, -0.0022, -0.006…
$ hpc_flow_mod <dbl> -5e-04, -3e-04, 3e-04, 1e-04, -1e-04, 5e-…
$ tra <int> 100, 100, 100, 100, 100, 100, 100, 100, 1…
$ t2 <dbl> 518.67, 518.67, 518.67, 518.67, 518.67, 5…
$ t24 <dbl> 182.59, 182.56, 182.54, 183.01, 182.73, 1…
$ t30 <dbl> 1588.38, 1590.73, 1590.70, 1588.91, 1588.…
$ t50 <dbl> 1397.780, 1405.880, 1401.310, 1403.280, 1…
$ p2 <dbl> 14.62, 14.62, 14.62, 14.62, 14.62, 14.62,…
$ p15 <dbl> 21.61, 21.61, 21.60, 21.61, 21.61, 21.61,…
$ p30 <dbl> 554.43, 553.62, 554.24, 553.79, 553.90, 5…
$ nf <dbl> 2388.07, 2388.06, 2388.08, 2388.06, 2388.…
$ nc <dbl> 9072.24, 9059.50, 9064.83, 9058.04, 9064.…
$ epr <dbl> 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1…
$ ps30 <dbl> 47.40, 47.32, 47.19, 47.28, 47.42, 47.21,…
$ phi <dbl> 521.88, 522.03, 521.80, 522.25, 522.17, 5…
$ nrf <dbl> 2388.06, 2388.06, 2388.03, 2388.06, 2388.…
$ nrc <dbl> 8147.58, 8151.74, 8146.37, 8144.65, 8146.…
$ bpr <dbl> 8.3923, 8.4385, 8.4234, 8.3955, 8.4371, 8…
$ farb <dbl> 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03,…
$ htbleed <int> 393, 392, 392, 391, 393, 394, 391, 392, 3…
$ nf_dmd <int> 2388, 2388, 2388, 2388, 2388, 2388, 2388,…
$ pcnfr_dmd <int> 100, 100, 100, 100, 100, 100, 100, 100, 1…
$ w31 <dbl> 38.76, 38.90, 38.96, 38.94, 38.94, 39.11,…
$ w32 <dbl> 23.4970, 23.4240, 23.4460, 23.3412, 23.44…
$ X44321P02_op016_median_first <dbl> 23.25222, 23.25222, 23.25222, 23.25222, 2…
$ X44321P02_op420_median_first <dbl> 14.54578, 14.54578, 14.54578, 14.54578, 1…
$ X54321P01_op116_median_first <dbl> 22.15643, 22.15643, 22.15643, 22.15643, 2…
$ X54321P01_op220_median_first <dbl> 29.89512, 29.89512, 29.89512, 29.89512, 2…
$ X65421P11_op232_median_first <dbl> 188.632, 188.632, 188.632, 188.632, 188.6…
$ X65421P11_op630_median_first <dbl> 231.0214, 231.0214, 231.0214, 231.0214, 2…
$ rul <int> 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 1…
$ distance <dbl> 0.0000, 6783.8984, 1134.9964, 1134.9964, …
df %<>% mutate_all(na_if,"") ## replace "" with NA
## Change variable types
df$datetime = ymd_hms(df$datetime)
df = mutate(df, across(.cols = c(unit), .fns = as.character))
df = mutate(df, across(.cols = c(dataset, unit, operator), .fns = as.factor))
df = mutate(df, across(.cols = c(tra, htbleed, nf_dmd, pcnfr_dmd), .fns= as.numeric))Data validation
pacman::p_load(pointblank)
# Step 1
act = action_levels(warn_at = 0.01, notify_at = 0.01)
# Step 2
agent = create_agent(tbl = df, actions = act)
# Step 3
agent %<>%
## technically correct checks
col_is_posix(columns = 'datetime') %>%
col_is_factor(columns = vars(dataset, unit, operator)) %>%
col_is_numeric(columns = -c(1,2,3,4,5,6,37)) %>%
col_is_integer(columns = vars(flight_cycle,rul)) %>%
## consistency checks
col_vals_not_null(columns = c(1:ncol(df))) %>%
col_vals_gte(columns = vars(t2, t24, t30, t50, nf, nc, phi, nrf, nrc, w31, w32, distance, rul,
X44321P02_op016_median_first, X44321P02_op420_median_first, X54321P01_op116_median_first,
X54321P01_op220_median_first, X65421P11_op232_median_first, X65421P11_op630_median_first),
value = 0)
# (4) Eval
results = interrogate(agent)
results | Pointblank Validation | |||||||||||||
| [2023-02-20|15:11:51]
data frame
dfWARN
0.01
STOP
—
NOTIFY
0.01
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | col_is_posix()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 2 | col_is_factor()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 3 | col_is_factor()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 4 | col_is_factor()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 5 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 6 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 7 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 8 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 9 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 10 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 11 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 12 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 13 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 14 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 15 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 16 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 17 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 18 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 19 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 20 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 21 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 22 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 23 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 24 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 25 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 26 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 27 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 28 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 29 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 30 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 31 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 32 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 33 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 34 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 35 | col_is_numeric()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 36 | col_is_integer()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 37 | col_is_integer()
|
— |
|
✓ |
1 |
11.00 |
00.00 |
○ |
— |
○ |
— | ||
| 38 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 39 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 40 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 41 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 42 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 43 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 44 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 45 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 46 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 47 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 48 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 49 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 50 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 51 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 52 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 53 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 54 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 55 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 56 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 57 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 58 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 59 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 60 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 61 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 62 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 63 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 64 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 65 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 66 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 67 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 68 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 69 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 70 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 71 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 72 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 73 | col_vals_not_null()
|
— |
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | ||
| 74 | col_vals_not_null()
|
— |
|
✓ |
30K |
13K0.44 |
17K0.56 |
● |
— |
● |
|||
| 75 | col_vals_not_null()
|
— |
|
✓ |
30K |
29K0.97 |
1K0.03 |
● |
— |
● |
|||
| 76 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 77 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 78 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 79 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 80 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 81 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 82 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 83 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 84 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 85 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 86 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 87 | col_vals_gte()
|
|
✓ |
30K |
29K0.97 |
1K0.03 |
● |
— |
● |
||||
| 88 | col_vals_gte()
|
|
✓ |
30K |
13K0.44 |
17K0.56 |
● |
— |
● |
||||
| 89 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 90 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 91 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 92 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 93 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 94 | col_vals_gte()
|
|
✓ |
30K |
30K1.00 |
00.00 |
○ |
— |
○ |
— | |||
| 2023-02-20 15:11:52 EST 2.1 s 2023-02-20 15:11:54 EST | |||||||||||||
Imputation
From the Pointblank Validation above, we can see that distance has a little over 3% of missing values, thus imputation is necessary before constructing a predictive model.
median_distance = median(df$distance, na.rm = T)
df$distance = replace_na(df$distance, median_distance)Exclude observations missing RUL
In order to create a regression model, the response variable must not be null. Therefore, observations that did not have a RUL for training a model were dropped.
df %<>% drop_na(rul) Aggregate Data
Given the insufficient information about the health status of each engine, the data was aggregated to the last flight cycle to capture the latest or averaged measures for RUL prediction.
For each engine, most measures, such as temperature and pressure, were averaged across all flight cycles to account for changes (both degradation and additional maintenance) in between flights. distance was aggregated as total to reflect the accumulated traveled distance.
hpc_eff_mod and hpc_flow_mod were input variables of the simulation that generated the raw data, so they were not included in the aggregation.
df %<>% group_by(dataset, esn, unit, operator) %>%
summarize(last_flight_cycle = max(flight_cycle),
last_datetime = max(datetime),
mean_tra = mean(tra),
mean_t2 = mean(t2), mean_t24 = mean(t24), mean_t30 = mean(t30), mean_t50 = mean(t50),
mean_p2 = mean(p2), mean_p15 = mean(p15), mean_p30 = mean(p30),
mean_nf = mean(nf), mean_nc = mean(nc),
mean_epr = mean(epr), mean_ps30 = mean(ps30), mean_phi = mean(phi),
mean_nrf = mean(nrf), mean_nrc = mean(nrc), mean_bpr = mean(bpr),
mean_farb = mean(farb), mean_htbleed = mean(htbleed),
mean_nf_dmd = mean(nf_dmd), mean_pcnfr_dmd = mean(pcnfr_dmd),
mean_w31 = mean(w31), mean_w32 = mean(w32),
mean_X44321P02_op016 = mean(X44321P02_op016_median_first), mean_X44321P02_op420 = mean(X44321P02_op420_median_first),
mean_X54321P01_op116 = mean(X54321P01_op116_median_first), mean_X54321P01_op220 = mean(X54321P01_op220_median_first),
mean_X65421P11_op232 = mean(X65421P11_op232_median_first), mean_X65421P11_op630 = mean(X65421P11_op630_median_first),
total_distance = sum(distance),
rul = min(rul))`summarise()` has grouped output by 'dataset', 'esn', 'unit'. You can override
using the `.groups` argument.
Export Data
The data was then exported for later use in the project.
write_csv(df, 'ge_data.csv')